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Description 

Technical Field 

5 The present invention relates to a method of searching a three<limensional structure databases, which can be uti- 
lized for finding novel lead-structures of medicaments, pesticides, and other biologically active compounds. 

Background Art 

10 Medicaments generally exhibit their biological activities through strong interactions with target biopolymers. In 
recent years, three-dimensional structures of biopolymers, which play important roles in living bodies, have been 
revealed successively on the basis of progresses in X-ray crystailographic analyses and NMR technology. With the 
advance of these researches, methodologies of theoretical approach based on information about three-dimensional 
structures of biopolymers and their successful results have been reported with respect to lead generations of new 

75 drugs, which were conventionally achieved mostly by random screenings or accidental discoveries. Importance of such 
approach is recognized with increasing interest, and researches from various viewpoints have been conducted in all the 
world. 

An object of these researches is to provide a process for creation of ligand candidate structures by means of a com- 
puter. Another object is to provide a process for searching for ligand structures from a three-dimensional structure data- 

20 base containing known compounds. The former (automatic structure construction approach) is advantageous in that 
the process can suggest a wide variety of possible ligand structures irrespective of known or unknown structures. On 
the other hand, the latter (database approach) is used in searching for novel biologically active compounds from data- 
bases comprised of accumulated structures of known compounds. When a search is applied to a database which con- 
sists of compounds stocked in an own company or commercially available compounds, this method is particularly 

25 advantageous in that compounds found to reach criteria of hit ("hit compounds") can be obtained without any synthetic 
efforts, and their association constants to target biopolymers and biological activities can be measured. 

Synthesis of a compound with a novel molecular skeleton generally needs several months even for an expert, and 
accordingly, years of and burden of work are required to synthesize a lot of promising structures and measuring their 
activities. However, if hit compounds are already available, measurements of association constants and biological activ- 

30 ities can be easily performed for dozens of promising ligand candidate compounds. In addition, based on the structure 
of a compound found to have a considerable extent of high association constant or activity, an efficient lead generation 
is achievable by designing compounds with higher association constants and biological activities and synthesizing the 
compounds. For these reasons, approaches have recently been interested which comprises the steps of searching 
three-dimensional structure databases consisting of known compounds and discovering novel pharmacologically active 

35 compounds. 

However, one of problems of this method is by what query the search is carried out. Generally, a three-dimensional 
structure database is searched by queries of whether or not given atomic groups or functional groups exist which are 
presumed as essential for the desired activity in typical biologically active compounds used as templates, or alterna- 
tively, whether or not their relative positions are similar to those of the template compounds. Where the structure of a 

40 target biopolymer is unknown, basically no approach can be relied on other than this process. However, since these 
queries are based on assumptions and hypotheses, hit compounds may often fail to have the same activity as that of 
the template compounds. Where two or more molecules with quite different structural features exhibit similar activities, 
it can only become possible to create more probable operating hypotheses, with reference to functional groups essen- 
tial for the activity and their relative positions, by means of structural information about these molecules. 

45 The most difficult problem in the search of three-dimensional structure databases lies in the handling of conforma- 
tional flexibility of compounds. It has been known that the conformations of a ligand as it binds to a biopolymer (i.e., the 
most stable complex is formed) do not necessarily accord with any of stable structures of the molecule itself, in the state 
of a crystal or a solution, or a structure with the most stable energy obtained by energy calculation, and that one ligand 
molecule can form stable complexes in different conformations depending on target biopolymers. Generally, a set of 

50 atomic coordinates representing a three-dimensional structure of one conformation among multiple possible conforma- 
tions is contained as for one compound in a database. Furthermore, except for databases derived from crystal struc- 
tures, information of three-dimensional structure of each compound is obtained from two-dimensionally inputted 
structure through calculation. These three-dimensional structures often represent one of local minimum structures that 
can be taken by the molecule, per se. 

55 Therefore, if a search is carried out merely on the basis of conformations contained in a database to judge whether 
or not compounds meet the criteria of hit, most compounds fail to be selected which should be hit if other conformations 
are considered. Although the number of conformations to be considered may vary depending on the number of rotata- 
ble bonds, as well as factors such as degrees of precise consideration of conformations, several tens to hundreds of 
thousands conformations should be taken into account for a moderately flexible molecule containing 3 to 6 rotatable 
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bonds. In order to consider these possible conformations, available processes are limited to either of those comprising 
the steps of selecting promising conformations and inputting them in a database beforehand, or alternatively, generat- 
ing conformations and examining at the time of conducting a search. In any events, enormous computer resources and 
calculation time are required. 

s Recently, Kearsley et al. with Merck prepared a database which comprises 20 conformations at most for a single 

compound and reported "FLOG," a searching system in which a search is carried out using searching standard mainly 
consisting of relative positions of functional groups (M.D. Miller, S.K. Kearsley, D.J. Underwood/and R.R Sheridan: 
Journal of Computer-Aided Molecular Design, 8. 1994, pp. 153-1 74, FLOG: A system to select 'quasi-flexible , ligands 
complementary to a receptor of known three-dimensional structure). It was reported that the FLOG took approximately 

10 one week to complete searching a database containing 2,000,000 conformations for 1 00,000 compounds by means of 
a super computer CRAY The twenty conformations in average for each compound are stored by selecting energetically 
stable local minimum structures through prior conformational analysis of each compound, for which enormous time and 
burden of work are needed. Nevertheless, twenty conformations per a single compound are insufficient. 

In addition, the method adopts the query which comprise the presence or absence of functional groups presumably 

15 essential for the activity of a compound as a template, as well as distance, angle, or direction between the groups. 
These queries are most plain among possible queries, and although they have advantages to shorten calculation time 
using simple algorithm, high probability cannot be expected that hit compounds actually have the desired activity. The 
reason lies in that a molecule having inappropriate whole molecular shape and size fails to exhibit activity, even if func- 
tional groups merely have desired relative positions. If a three-dimensional structure of the target biopolymer is known, 

20 the most effective search can be achieved by means of such information, which provides high probability that hit com- 
pounds have the desired activity. 

For example, Eyermann et al. with Dupont Merck performed a database search based on relative positions of func- 
tional groups of ligand molecules which had been analyzed by X-ray crystallography as a complex with a biopolymer 
(RY.S. Lam, RK. Jadhav, C.J. Eyemann, C.N. Hodge, Y. Ru. LT. Bacheler, J.L Meek, M.J. Otto, M.M.. Rayner, Y.N. 

25 Wong, C.H. Chang, P.C. Weber, DA Jackson, T.R. Sharp, and S. Erickson-Viitanen: Science, 263. 1994, pp.380-384, 
Rational design of Potent, Bioavailable, Nonpeptide Cyclic Ureas as HIV Protease Inhibitors.). In the HIV protease sys- 
tem, compounds were searched for as to criteria where the three positions of an oxygen atoms in water molecules are 
in similar relative positions which connect the protease to the ligand through hydrogen bonds at the centers of two ben- 
zene rings in the peptide ligand molecule. Among the hit compounds, one compound was selected which well fitted into 

30 the ligand-binding region and formed hydrogen bonds other than those formed by the oxygen atoms used as query. 
They reported that a compound with high activity was developed based on the structure of this compound by repeated 
works of synthesizing compounds with appropriate modifications while measuring inhibitory activities against the 
enzyme. However, the publication lacks detailed descriptions as for the search process, and their technique of handling 
conformations is not deducible. 

35 The search query capable of providing the highest probability that hit compounds have desired activity is that 
whether or not compounds stably bind to a ligand-binding region of the target biopolymer. Although in some cases the 
desired activity fails to be exhibited due to physicochemical factors such as water solubility even if a compound satisfies 
the criteria, reaching the criteria is an essential requirement for the expression of activity. On the other hand, the pres- 
ence or absence of specific functional groups and the relative positions of functional groups are not necessarily essen- 

40 tial for the expression of activity, and even if a compound falls within a different family and has dissimilar skeletal 
structure to the known active structures, the compound may possibly exhibit the same activity if it can form a complex 
between the target biopolymer at a similar extent of stability. Therefore, compounds satisfying the criteria that they can 
stably bind to a ligand-binding region of the target biopolymer have high probability of being actual candidates as lig- 
ands. In other words, by using the criteria of fitting to a ligand-binding region of the target biopolymer, it becomes pos- 

45 sible to identify compounds which exhibit the desired biological activity and have a wide variety of structures from 
databases. 

In order to determine whether or not a given ligand molecule can bind to the target biopolymer, it is necessary to 
find the most stable docking structure among possible docking structures and to know the degree of stability of the 
docking structure. Also, in order to find the most stable docking structure between a given biopolymer and a ligand mol- 

50 ecule, it is necessary to estimate stabilities of all docking structures by considering all possible binding modes (corre- 
sponding to rotation or translation of one molecule while fixing another compound), and possible ligand conformations. 
However, since the process needs enormous calculation, it cannot be achieved by interactive methods using a graphic 
display. Therefore, the development of an automatic docking method has been desired which can perform the above 
process automatically and efficiently. 

55 As automatic docking methods, Kuntz et al. developed a method for estimating possible binding modes by repre- 
senting the shape of a ligand-binding region by means of several to dozens of inscribed spheres and comparing a set 
of vectors formed by the centers of the spheres with a set of intramolecular vectors of a ligand compound (R.L Desjar- 
lais, R.R Sheridan, G.L Seibel. J.S. Dixon, I.D. Kuntz and R. Venkataraghavan: J. Med. Chem. 31 (1989) 722-729). 
Using Shape Complementarity as an Initial Screen in Designing Ligands for a Receptor-Binding Site of Known 3- 
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Dimensional Structure.). In a recent improved process, the process is modified so that intermolecular energy can be 
calculated in addition to the score representing coincidence between the vectors. 

However, this method is disadvantageous because it needs an undue time even only for covering all binding 
modes. Some improved processes include alternations so that conformations can be varied for the docking of a single 

5 compound. However, the processes are not designed so as to provide an automatic procedure covering all of the dock- 
ing processes. Although energy evaluation can be done concerning hit compounds, only simple scores can be calcu- 
lated and hydrogen bonds or the like are disregarded during the docking process. Furthermore, its accuracy is 
insufficient, i.e., there is a problem in that, for example, the true docking structure determined by crystal analysis fails 
to be judged as the highest rank and is sometimes judged as rather low in rank. Moreover, its unavoidable defect is in 

10 that the process is carried out fundamentally by a fixed conformation for each compound, and conformational flexibili- 
ties cannot be handled. In a manual operation cannot compensate for the above defect especially, its practical utility is 
low in a three-dimensional database search which requires full automatic operation for numerous compounds. 

The inventors of the present invention developed "ADAM," a method for estimating the most stable docking struc- 
tures of a biopolymer and a ligand molecule automatically and without any preconception, and they successfully solved 

15 all of the aforementioned problems (PCT/JP93/0365, PCT International Publication WO 93/20525; M. Yamada and A. 
Itai, Chem. Pharm. Bull., 41, p.1200, 1993; M. Yamada and A. Itai, Chem. Pharm. Bull., 41, p.1203, 1993; and M. 
Yamada Mizutani, N. Tomioka and A. Itai, J. Mol. Biol., 243, pp.310-326, 1994). 

In the applications of "ADAM" to several enzyme-inhibitor systems whose structures had been elucidated by crys- 
tallographic analyses, all the resulted docking models with minimum energy perfectly reproduced the structure in the 

20 crystal, and torsion angles of rotatable bonds in the ligand molecule and hydrogen-bonding schemes were also accu- 
rately reproduced. The high accuracy of "ADAM" is based on the calculations of structure optimization (energy minimi- 
zation) performed altogether three times. Though the time required may vary depending on the performance of a 
computer, numbers of dummy atoms, number of heteroatoms, number of rotatable bonds and the like and cannot be 
generalized, it takes from several minutes through approximately 1 hour to obtain an initial docking model with an ordi- 

25 nary drug molecule by means of a commonly used workstation (R4400). This speed is dozens of times faster than the 
fastest method which handles possible conformations reported so far. 

However, although ADAM is suitable in searching for the most stable docking structure between the target biopol- 
ymer and a single ligand molecule, the method is not suitable for searching novel ligand compounds from a wide variety 
of numerous compounds contained in a three-dimensional structure database as it is. Therefore, further improvement 

30 has been desired. 

Accordingly, an object of the present invention is to provide a method suitable for searching novel ligand com- 
pounds from a wide variety of numerous compounds, thereby solving the problems of the state of the art. 

Description of the Invention 

35 

The inventors of the present invention performed various researches. As a result, they succeeded in developing a 
process for selecting one or more promising ligand candidates from an enormous number of trial compounds: wherein 
the process comprises the steps of estimating the most stable docking structure for each trial compound; and selecting 
one or more ligand candidate compounds from all the trial compounds by given criteria including the interaction energy 
40 between the target biopolymer and the trial compound in the most stable docking structures; and the aforementioned 
step of estimating the most stable docking structure for each trial compound comprises the steps of: 

evaluating all possible docking structures generated through docking of the trial compound to the biopolymer, while 
varying conformation of the trial compound and repeating structural optimizations, based on interaction energies 
45 between the biopolymer and the trial compound, e.g.. hydrogen bonds, electrostatic interaction, and van der Waals 
force. The present invention was achieved on the basis of these findings. 

The present invention thus provides a method of searching one or more ligand compound to a target biopolymer 
from a three-dimensional structure database, which comprises the steps of: 

50 

(i) the first step of assigning hydrogen-bonding category numbers, information for calculating force-field energies, 
and information for generating conformations, to two or more trial compounds, in addition to three-dimensional 
atomic coordinates thereof; 

(ii) the second step of preparing physicochemicai information about a ligand-binding region and one or more 
55 dummy atoms based on the three-dimensional atomic coordinates of the target biopolymer; 

(iii) the third step of estimating the most stable docking structure, wherein said step further comprises the steps of 
generating possible docking structures by docking a trial compound to the biopolymer while varying conformations 
of the trial compound, and evaluating interaction energy between the target biopolymer and the trial compound, 
based on the hydrogen-bonding category number, the information for calculating force-field energies, and the infor- 
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mation for generating conformations assigned to the trial compound in addition to the three-dimensional atomic 
coordinates according to the step (i) and based on the physicochemical information about the ligand-binding region 
prepared in the step (ii):and then ; 

(iv) the fourth step of deciding whether or not the trial compound should be adopted as a ligand-candidate com- 
s pound based on given criteria including the interaction energy values between the target biopolymer and the trial 

compound in the most stable docking structure found according to the step (iii); and 

(v) the fifth step of repeating the step (iii) and the step (iv) for all of the trial compounds. 

The method of the present invention may comprise the sixth step of further selecting the ligand-candidate com- 
10 pounds selected in the step (iv) based on different criteria including the number of hydrogen bonds formed between the 
target biopolymer and/or interaction energy values derived therefrom. 

Brief Description of Drawings 

15 Figure 1 shows a conceptual scheme by featuring the method of the present invention. 

Figure 2 is a flowchart according to an embodiment of the method of the present invention. In the figure, S repre- 
sents each step. 

Figure 3 shows examples of the substrate and known inhibitor analogues for a dihydrofolate reductase selected as 
ligand candidate compounds by the method of the present invention. 
20 Figure 4 shows examples of ligand-candidate compounds with novel structures for a dihydrofolate reductase 
obtained by the method of the present invention. 

Best Mode for Carrying out the Invention 

25 In the method of the present invention, the trial compounds are not particularly limited so long as their structures 
are known. For example, various compounds contained in currently available databases can be used. The hydrogen- 
bonding category numbers are identification numbers for functional groups capable of forming hydrogen bonds, and 
assigned to heteroatoms that directly participate in hydrogen bonds by the functional groups. By referring to the 
number, a geometric structure of the functional group and the nature of the hydrogen bond as well as the position of 

30 partner hydrogen-bonding atoms (dummy atoms) are instantly generated. The information for calculating force field 
energies includes an atom type number to discriminate atomic element and the state of hybridization of each atom that 
are used for calculating intramolecular and intermolecular interaction energies by means of molecular force field. The 
information includes an atomic charge assigned to each atom. The information for preparing conformations is used for 
systematically generating different conformations by varying torsion angles of rotatable bonds, in the range of the given 

35 initial and final values of the torsion angles by given values of step angle. H includes, for one bond to be rotated, a set 
of four atomic numbers constituting the torsion angle, and the initial and final values of the torsion angle and a step 
angle of rotation. 

TTie term "biopolymer" means biological macromolecules and includes artificial mimetic molecules based on bio- 
logical macromolecules. The term "physicochemical information about the ligand-binding region" means potential inf lu- 

40 ence brought by all of the atoms of the biopolymer inside a cavity space of the biopolymer to which a ligand can bind. 
The meaning includes hydrogen-bonding properties of three-dimensional grid points within the ligand-binding region 
(hydrogen bond acceptor or hydrogen bond donor), and van der Waals interaction energies and electrostatic interaction 
energies generated between the biopolymer and probe atoms when the probe atoms are placed on the three-dimen- 
sional grid points. The meaning of "docking" is used to include the step of forming a complex from a trial compound mol- 

45 ecule and a biopolymer, including the step of finding a stable docking structure formed by the both of them. The docking 
can generally be performed by interactive methods using computer graphics, as well as by automatic docking methods 
as the case may be. 

The term "interaction between a biopolymer and a trial compound" is a force arisen between the biopolymer and 
the trial compound, whose examples include hydrogen bond, electrostatic interaction, van der Waals interaction and the 
50 like. The term "ligand" or "ligand compound" is a compound with low molecular weight that binds to a biopolymer, whose 
examples include medicaments, substrates and inhibitors of enzymes, coenzymes, and other biologically-active sub- 
stances. 

According to an embodiment of the method of the present invention, the above third step may comprise the follow- 
ing steps: 

55 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all of 
the combinatorial correspondences between hydrogen-bonding heteroatoms in the trial compound and dummy 
atoms placed at the positions of heteroatoms that can be hydrogen-bonding partners of hydrogen-bonding func- 
tional groups in the biopolymer; 
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(2) step (b) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the trial 
compound and conformations of hydrogen-bonding part of the trial compound by comparing the distances between 
dummy atoms with the distances between the corresponded hydrogen-bonding heteroatoms while systematically 
changing the conformations of the trial compound; and 
5 (3) step (c) of obtaining the structure of a docking structure comprising the biopolymer and the trial compound by 

converting the atomic coordinates of the trial compound to the coordinate system of the biopolymer for each hydro- 
gen-bonding scheme and conformation obtained in the step (b) based on the correspondences between the hydro- 
gen-bonding heteroatoms in the trial compound and the dummy atoms. 

w In addition, according to another embodiment of the method of the present invention, the above third step may com- 
prise the following steps: 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all of 
the combinatorial correspondences between hydrogen-bonding heteroatoms in a partial structure of the trial com- 

15 pound and the dummy atoms placed at the positions of heteroatoms that can be hydrogen-bonding partners of 
hydrogen-bonding functional groups in the biopolymer; 

(2) step (b) of simultaneously estimating hydrogen-bonding schemes between the biopolymer and the trial com- 
pound and conformations of the partial structure of the trial compound by comparing the distances between dummy 
atoms with the distances between the corresponded hydrogen-bonding heteroatoms while systematically changing 

20 the conformations of the trial compound; 

(3) step (c) of preserving correspondences of the dummy atoms and the hydrogen-bonding heteroatoms that pro- 
vide impossible hydrogen-bonding schemes in the partial structure of the trial compound, and hydrogen-bonding 
heteroatoms that cannot form any hydrogen-bond with the dummy atoms based on the hydrogen-bonding schemes 
and conformations obtained in the step (b); 

25 (4) step (d) of searching likely binding modes between the biopolymer and the trial compound by preparing all of 
the combinatorial correspondences between dummy atoms and hydrogen-bonding heteroatoms in a the whole 
structure of the trial compound, excluding the correspondences containing the hydrogen-bonding heteroatoms pre- 
served in the step (c) and the correspondences containing the dummy atoms and hydrogen-bonding heteroatoms 
preserved in the step (c); 

30 (5) step (e) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the trial 
compound and conformations of hydrogen-bonding part of the trial compound by comparing the distances between 
dummy atoms with the distances between corresponded hydrogen-bonding heteroatoms while systematically 
changing the conformations of the trial compound; and 

(6) step (f) of obtaining the structure of a docking structure comprising the biopolymer and the trial compound by 
35 converting the atomic coordinates of the trial compound to the coordinate system of the biopolymer for each hydro- 
gen-bonding scheme and conformations obtained in the step (e) based on the correspondences between the 
hydrogen-bonding heteroatoms in the trial compound and the dummy atoms. 

When the third step includes the above sub-steps, it becomes possible to accelerate the search for stable docking 
40 structures consisting of the biopolymer and the trial compound, and in addition, it also become possible to search for 
stable docking structures including the most stable docking structure in short time even where the biopolymer and/or 
the trial compounds have complicated structures. 

According to a further embodiment of the method of the present invention, the above third step may comprise the 
following steps: 

45 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all of 
the combinatorial correspondences between hydrogen-bonding heteroatoms in the trial compound and dummy 
atoms placed at the positions of heteroatoms that can be hydrogen-bonding partners of hydrogen-bonding func- 
tional groups in the biopolymer; 
so (2) step (b) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the trial 
compound and conformations of hydrogen-bonding part of the trial compound by comparing the distances between 
dummy atoms with the distances between hydrogen -bonding heteroatoms while systematically changing the con- 
formations of the trial compound; 

(3) step (c) of optimizing the conformations of the trial compound so that the positions of the dummy atoms and the 
55 hydrogen-bonding heteroatoms in the trial compound can be in accord with each other while retaining the hydro- 
gen-bonding scheme obtained in the step (b), and then excluding the conformations of the trial compound having 
high intramolecular energies; 

(4) step (d) of obtaining the structure of a complex comprising the biopolymer and the trial compound by converting 
the entire atomic coordinates of the trial compound to the coordinate system of the biopolymer for each conforma- 
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tion not excluded in the step (c) based on the corresponding relationships between the hydrogen-bonding heter- 
oatoms in the trial compound and the dummy atoms; 

(5) step (e) of excluding from docking structures of hydrogen-bonding parts obtained in the step (d) the docking 
structures in which intramolecular energies of the hydrogen-bonding parts of the trial compound and intermolecular 

5 energies between the biopolymer and the hydrogen-bonding parts of the trial compound are high, and then carry- 

ing out structure optimizations of the remaining docking structures; 

(6) step (f) of obtaining new docking structures comprising the whole trial compound by generating the conforma- 
tions of non-hydrogen-bonding parts of the trial compound for each docking structure obtained in the step (e); and 

(7) step (g) of excluding from the docking structures obtained in the step (f) the docking structures in which intramo- 
10 lecular energies of the whole structure of the trial compound and intermolecular energies between the biopolymer 

and the trial compound are high, and then carrying out structure optimizations of the remaining docking structures. 

When the third step includes the aforementioned sub-steps, it becomes possible to select appropriate docking 
structures even when reduced numbers of conformations are initially generated, and thereby obtain stable docking 
15 structures with high accuracy. 

In the above methods, the hydrogen-bonding functional groups include functional groups or atoms that can partic- 
ipate in the formations of hydrogen bonds. The term "hydrogen-bonding heteroatoms" means heteroatoms which exist 
in the hydrogen -bonding functional groups present in the trial compounds. The term "hydrogen -bonding part" means a 
partial structure of the trial compound structure which comprises all the hydrogen-bonding heteroatoms that are corre- 
20 sponded to dummy atoms, and the term "non-hydrogen-bonding part" means a partial structure other than the hydro- 
gen-bonding part. 

According to a still further embodiment of the present invention, there is provided a method for preparing a three- 
dimensional structure database which comprises the step of optionally adding three-dimensional coordinates of hydro- 
gen atoms, and adding atom type numbers, hydrogen-bonding category numbers, atomic charge, and modes of bond 

25 rotation in addition to three-dimensional coordinates of trial compounds. 

Preferred embodiments of the present invention will be explained below by referring to the flowchart shown in Fig- 
ure 2. In Figure 2, the symbol "S" represents each step and the name "ADAM" represents the method for searching a 
stable docking structure of biopolymer and ligand molecules disclosed in International Publication WO93/20525 (Inter- 
national Publication Date: October, 14, 1993). The disclosure of the specification and claims of the International Publi- 

30 cation WO93/20525 is herein incorporated by reference. 

The "ADAM" format three-dimensional database is first prepared which comprises at least two trial compounds. 
The aforementioned database can be prepared, for example, by the method explained below. Three-dimensional coor- 
dinates of trial compounds are first inputted. As the three-dimensional coordinates of the trial compounds, coordinates 
obtained by converting two-dimensional coordinates obtainable from a two-dimensional database to a three-dimen- 

35 sional coordinates through force-field calculation, or coordinates obtained from structural analysis of a single crystal as 
well as those from crystal database, or those from three-dimensional structures obtained by model building procedure 
based on energy calculation may be used. As regards three-dimensional structures of each trial compound, conforma- 
tions are basically not necessary to be taken into account so long as geometric values such as bond distances and 
bond angles between the atoms are accurate. However, in order to calculate atomic charge of each atom in the trial 

40 compound as accurately as possible, it is preferred to input atomic coordinates of a relatively stable structure. 

After inputting the above three<limensional coordinates of the trial compounds, a three-dimensional database can 
be prepared automatically by adding hydrogen atoms if necessary, and assigning a force-field atom-type number to 
each atom, hydrogen-bonding category numbers to hydrogen-bonding heteroatoms, atomic charge to each atom, and 
modes of bond rotations (for example, information about covalent bonds to be rotated, values of initial, final, and incre- 

45 ment torsion angles of the rotation) . 

The force-field atom-type numbers for each atom in the trial compound can be assigned, for example, according to 
the numbering method of Weiner et al. (Weiner, S. J., Kollaman, P.A., Case, D.A., Singh, U.C., Ghio. C, Alagona, G., 
Profeta, S., Jr., and Weiner, R, J. Am. Chem. Soc, 106, 1984, pp.765-784). The hydrogen-bonding category numbers 
for hydrogen-bonding heteroatoms in the trial compound can be assigned according to the following Table 1 . The atomic 

so charge of each atom in the trial compound can be calculated, for example, by the Gasteiger method or by molecular 
orbital calculation with MNDO, AM1 or other method in the MOPAC program. As for the mode of bond rotation, it is pre- 
ferred to indicate to rotate all rotatable bonds systematically with rotation-angle intervals of 60° to 120°. 



55 



7 



EP0 790 567A1 



Table 1 



5 


A List of Hydrogen-Bonding Hereroatoms and their Hydrogen-Bonding Cate- 
gory Numbers included in Hydrogen-Bonding Functional Groups 




Hydrogen-Bonding Cate- 
gory Number 


Heteroatoms included in Hydrogen-Bonding Func- 
tional Groups 




1 


sp 2 N in Primary Amines 


10 


2 


sp 3 N in Primary Amines 




3 


sp 3 N in Ammonium Ions 




4 


sp 2 N in Amides 


15 


5 


sp 3 N in Secondary Amines 




6 


N in Aromatic Groups 




7 


Protonated N in Aromatic Groups 




8 


N in Tertiary Amines 


20 


9 


Protonated N in Tertiary Amines 




10 


0 in Hydroxyls with Rotatable C-O bond 




11 


0 in Ethers 


25 


12 


0 in Carbo f nyls 




13 


0 in Carboxylate Anions 




14 


0 in Carboxylic Acids 


30 


15 


0 in Phosphates 


16 


0 in Water Molecules 




17 


S in Mercaptos 




18 


S in Thioethers 


35 


19 


0 in Hydroxyls with fixed hydrogen atom positions 



Then, the number of atoms in the biopolymer and their atomic coordinates (except for atomic coordinates of hydro- 
gen atoms) are inputted (S2). As regards the atomic coordinates of the biopolymer, three-dimensional information 

40 obtained by X-ray crystallographic analyses or NMR analyses, information obtainable from the protein crystal structure 
database and the like, or alternatively, atomic coordinates of a biopolymer model constructed on the basis of such infor- 
mation or the like may be used. The atomic coordinates of the biopolymer are preferably provided as a three-dimen- 
sional coordinate system. In addition, cofactors that bind to the biopolymer and that structurally as well as functionally 
play important roles may be considered as a part of the biopolymer, and the atomic coordinates of the biopolymer in the 

45 state of being bound with the cofactor may be inputted to perform the successive steps. Examples of the cofactors 
include coenzymes, water molecules, metal ions and the like. 

Following the above step, the atomic coordinates of the hydrogen atoms in amino acid residues present in the 
biopolymer are calculated (S3). In general, it is difficult to determine the positions of hydrogen atoms in biopolymers by 
experimental procedures such as X-ray crystallographic analyses or NMR analyses. In addition, the information about 

so hydrogen atoms is not available from protein crystal structure databases or the like. Therefore, it is necessary to calcu- 
late the atomic coordinates of the hydrogen atoms in amino acid residues on the basis of the structures of the amino 
acid residues present in the biopolymer. For hydrogen atoms whose atomic coordinates cannot be uniquely determined 
as they bind to rotatable group of atoms in the amino acid residues, their atomic coordinates are preferably calculated 
on the assumption that they exist at trans positions. 

55 Subsequently, atomic charges are appended to the atoms in amino acid residues present in the biopolymer (S4), 
and then hydrogen-bonding category numbers are assigned to the heteroatoms in hydrogen-bonding functional groups 
present in the biopolymer (S5). As the values of the atomic charges, reported values calculated for each amino acid, for 
example, values reported by Weiner et al. can be used (Weiner, S.J., Kollman, PA, Case, D.A., Singh, U.C., Ghio, C, 
Alagona, G., Profeta, S., Jr. and Weiner, P., J. Am. Chem. Soc, 106, 1984, pp. 765-784). The hydrogen-bonding cate- 
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gory numbers can be assigned according to the Table 1 . 

A ligand-binding region is then specified (S6). As the ligand-binding region, spaces containing any part of the 
biopolymer can be specified by using preferably a rectangular box. Depending on the purposes, a ligand-binding pocket 
and its surrounding region of the biopolymer can be specified. If desired, a region containing any site in the biopolymer 

5 which binds a different molecule such as an effector and the like can also be specified. The term ligand-binding region" 
means a cavity on the concaved surface of the biopolymer to which a ligand molecule such as a substrate or an inhibitor 
bind. For assigning the range of the ligand-binding region, a part of the functions of the program "GREEN" can be used 
(Tomioka, N., Itai, A. and litaka, Y, Journal of Computer-Aided Molecular Design, vol. 1, 1987, pp. 197-210). 

Three-dimensional grid points are generated inside the ligand-binding region specified in the step S6, and for each 

10 three-dimensional grid point, an address number is assigned and grid point information is calculated (S7). The term 
"three-dimensional grid points" means the points on the three-dimensional lattice generated at a certain regular inter- 
vals within the ligand-binding region in the biopolymer. The meaning of "grid point information" includes local physico- 
chemical character at each grid point in the ligand-binding region such as van der Waals interaction energies and 
electrostatic interaction energies arisen between the biopolymer and probe atoms, as well as hydrogen-binding proper- 

15 ties. Wherein the van der Waals and electrostatic interaction energies are calculated on the assumption that probe 
atoms exist on each three-dimensional grid point. By using the grid point information on the three-dimensional grid 
points, it becomes possible to accelerate approximate calculations of the inter molecular interactions between the 
biopolymer and the trial compound to be carried out in the successive steps. In addition, reasonable determination of 
the positions of the dummy atoms, which is provided in the following steps, becomes achievable. As a result, possible 

20 docking structures comprising the biopolymer and the trial compound can be entirely searched in a short time. 

The three-dimensional grid points may be generated at intervals of 0.3-2.0 angstroms, preferably 0.3-0.5 ang- 
stroms in the region specified in the step S6. As the probe atoms, all sorts of atoms contained in possible ligand com- 
pounds are preferably included. The van der Waals interaction energy that acts between each atom placed on the 
three-dimensional grid points and the biopolymer can be calculated according to a conventional atom-pair type calcu- 

25 lation using an empirical potential function. As the empirical potential function, the Lennard-Jones type function repre- 
sented by the following equation can be used. 



30 j 

In the formula, symbol T is a sequential number representing a probe atom; and symbol "j" is a sequential number 
for the biopolymer atoms. Symbols "A" and "B" are parameters for determining the location and the magnitude of min- 
imum potential. Symbol "ry" is an inter-atomic distance between the Mh probe atom and the y-th atom in the biopolymer. 
35 As the parameters "A" and "B," the values reported by Weiner et al. can be used (Weiner, S.J. Kollman, P.A., Case, D.A., 
Singh, U.C., Ghio, C. Alagona, G., Profeta, S., Jr. and Weiner, P., J.Am. Chem. Soc., 106, 1984, pp. 765-784). 

The van der Waals interaction energy between the trial compound and the biopolymer, when the trial compound is 
placed on the three-dimensional grid points, can be calculated by the following equation: 

40 N 

E vdw = ^ vdw,m 0 

wherein "m" is the sequential number for each atom in the trial compound; N is the total number of atoms in the trial 
45 compound; and "m 0 " is the sequential number of the three-dimensional grid point that is closest to the m-th atom. 

The electrostatic interaction energy that generates between each probe atom placed on each three-dimensional 
grid point and the biopolymer can be calculated by the following equation: 



so Gefc^Z Kq i ,Er a 

j 

wherein symbols Vi." and "ry" are the same as those defined above; "qj" is the atomic charge of the y-th atom in the 
biopolymer; "K" is a constant for converting an energy unit; and " e " is a dielectric constant. As the dielectric constant, 
55 a fixed value may be used, or alternatively, it is preferably to use a value depending on ry, as proposed by Warshel et 
al. (Warshel, A., J. Phys. Chem., 83, 1979, pp. 1640 -1652). 

The electrostatic interaction energy between the trial compound and the biopolymer, when the trial compound is 
placed on the three-dimensional grid points, can be calculated by the following equation: 
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N 
m=1 

5 wherein m, N and m 0 are as defined above. 

The term "hydrogen-bonding grid point information" means information indicating that either hydrogen-donor or 
hydrogen-acceptor atom can form a hydrogen bond with the hydrogen-bonding functional group in the biopolymer if it 
is placed on the three-dimensional grid point, or alternatively that both hydrogen-donor and hydrogen-acceptor atom 
can (or cannot) form a hydrogen bond with the hydrogen-bonding functional group in the biopolymer if it is placed on 

10 the three-dimensional grid point. For example, the hydrogen-bonding information of three-dimensional grid points can 
be expressed by assigning number 1 for a three-dimensional grid point as a hydrogen-bond acceptor, number 2 for a 
three-dimensional grid point as a hydrogen-bond donor, number 3 on a three-dimensional grid having both properties, 
and number 0 on a three-dimensional grid having no hydrogen-bonding property. 

The hydrogen-bonding grid point information can be obtained as set forth below. If the distance "DP" between a 

15 three-dimensional grid point "P" and a hydrogen-donating atom "D" in the biopolymer is within a range (e.g., 2.5-3.1 A) 
that allows for the formation of a hydrogen bond, and if the angle ZDHP made by the three-dimensional grid point "P" , 
hydrogen "H," and atom "D" is within a range (e.g., more than 30°) that allows for the formation of a hydrogen bond, the 
three-dimensional grid point is determined to have property of hydrogen-bond acceptor. Similarly, if the distance "PA" 
between three-dimensional grid point "P" and hydrogen-accepting atom "A" in the biopolymer is within a range that 

20 allows for the formation of a hydrogen bond and if the angle ZALP which is made by the three-dimensional grid point 
"P," a lone-pair electron "L," and the hydrogen-accepting atom "A" is within a range that allows for the formation of a 
hydrogen bond, the three-dimensional grid point is determined to have the property of hydrogen-bond acceptor. If a 
three-dimensional grid point is proved to have neither the property of hydrogen-bond donor nor hydrogen-bond accep- 
tor, the point is assumed to have no hydrogen-bonding property. 

25 Hydrogen-bonding functional groups that are expected to form hydrogen bonds with the trial compound are 
selected from hydrogen-bonding functional groups present in the region Of the biopolymer specified in the step S6 (S8). 
If a lot of hydrogen-bonding functional groups are present, they can be selected depending on the degree of impor- 
tance. Furthermore, one or more dummy atoms are set to each hydrogen-bonding functional group selected in step S8 
based on the three-dimensional grid point information calculated in S7 (S9). 

30 In this step, a region in which a hydrogen-bond can be formed with each hydrogen-bonding functional group 
selected in step S8 (hydrogen-bonding region) is first determined based on the hydrogen-bonding grid point information 
calculated in step S7, and then a dummy atom is placed at the center of the hydrogen -bonding region comprising 
appropriate numbers of three-dimensional grid points, e.g., 5-20. The dummy atom is positioned within the hydrogen- 
bonding region and outside the van der Waals radii of other atoms. The hydrogen-bonding region is defined as a group 

35 of three-dimensional grid points that are neighboring to each other and have the same hydrogen-bonding properties. It 
should be noted that two or more dummy atoms may sometimes be placed from a single hydrogen-bonding functional 
group, or in contrast, no dummy atom may be generated. If the number of generated dummy atoms is large, it is pre- 
ferred to reduce the number to 10 or less, more preferably 5 to 10, by selecting dummy atoms present in a bottom of 
the cavity of the ligand-bonding region considering the importance. To each dummy atom, the same hydrogen-bond 

40 property is assigned as that of the hydrogen-bonding region to which the dummy atom belongs. 

Then, a trial compound is selected (S10) and then three-dimensional coordinates, hydrogen-bonding category 
numbers, information for molecular force-field calculation, and information for conformation generation are read from 
the "ADAM" format database prepared in S1 (S11). Then, correspondences are made combinatorially between the 
dummy atoms and the hydrogen-bonding heteroatoms in the trial compound (S12). If the number of dummy atoms is 

45 "m" and the number of hydrogen-bonding heteroatoms in the trial compound is "n" , the number of the correspondences 
(combination) represented by N(i), in which number of hydrogen-bonds formed is V, is mPixnCi wherein symbol "P" 
represents permutations and symbol "C" represents combinations. It is preferred to generate all of the possible combi- 
nations of the dummy atoms and the hydrogen-binding heteroatoms in the trial compound. 

Then, for all is that satisfy the- relationship l mjn ^ i ^ | maXf the steps S12-S39 are repeated. As a result, a total of 

50 

^ max 

I WW 

55 sets of correspondences of dummy atoms and hydrogen-bonding heteroatoms in the trial compound are generated. By 
these procedures, all possible combinations of the hydrogen-bonds formed between the biopolymer and the trial com- 
pound are generated, and thereby all the binding modes between the trial compound and the biopolymer can be 
searched systematically and efficiently. It is preferred to set l max as the smaller number of the dummy atoms or the 
hydrogen-bonding heteroatoms, and set l m j n as the number smaller than l max by 2-4. 
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More specifically, one of the correspondences of the dummy atoms and the hydrogen-binding heteroatoms gener- 
ated in S12 is selected (S13). In this step, correspondences in which the hydrogen-bonding properties of the dummy 
atoms do not match those of the hydrogen-bonding heteroatoms in the trial compound are not selected. Then, intera- 
tomic distances are calculated for all pairs of dummy atoms contained in the correspondence selected in S13 (S14). If 

5 both the number of dummy atoms and the number of hydrogen-bonding heteroatoms in the trial compound are 1 , the 
process jumps to step S22 skipping steps S14-S21 and then jumps to step S26 skipping steps S23-S25. If either of the 
number of dummy atoms or the number of hydrogen-bonding heteroatoms in the trial compound is 1 , the process jumps 
to step S22 skipping steps S1 4-S21 . 

The trial compound molecule is divided into a hydrogen-bonding part and a non-hydrogen-bonding part (S15), and 

10 rotatable bonds and their rotation modes in the hydrogen-bonding part divided in step S15 are selected (S16). Then, 
conformations of the trial compound are generated successively by rotating the bonds selected in step S16 according 
to the rotation modes inputted in the three-dimensional "ADAM" format database prepared in S1 (S1 7). For each gen- 
erated conformation, subsequent steps S18-S29 are performed. 

These steps will be detailed below. For each conformation generated in step S17, interatomic distances are calcu- 

75 lated for ail pairs of hydrogen-bonding heteroatoms in the trial compound contained in the correspondence selected in 
step S1 3 (S1 8). The interatomic distances of hydrogen-bonding heteroatoms are compared with those of corresponded 
dummy atoms. If the value of "F", which is a sum of the squares of the differences between the dummy-atom/dummy- 
atom distances as calculated in step S14 and the heteroatom/heteroatom distances of the trial compound as calculated 
in step 18, is outside a given range, the correspondence is excluded as inappropriate in the conformation (S19). That 

20 is, when F is calculated by the following equation: 

n(n-1)/2 

/-I 

25 

hydrogen-bonding correspondences and conformations of the trial compound which do not satisfy the following condi- 
tion are excluded: 

30 

wherein M n" is the number of hydrogen bonds; "r di " is the Mh distance between dummy atoms; and V is the Mh dis- 
tance between hydrogen-bonding heteroatoms in the trial compound; is the lower limit of F; and "kg'" is tne upper 
limit of F. The value of kg is preferably 0.6-0.8 and that of k a ' is preferably 1.2-1 .4. In this step, possible hydrogen-bond- 
ing schemes between the biopolymer and the trial compounds and possible conformations of the ligand molecule can 

35 be searched efficiently and thoroughly. 

Then, for the hydrogen-bonding correspondences and conformations not excluded in step S19, the conformation 
of the hydrogen-bonding part of the trial compound is optimized by minimizing the value of F (S20). With this step, the 
conformation of the trial compound is corrected so that the biopolymer and the trial compound can form stable hydrogen 
bonds. The conformation of the hydrogen-bonding part of the trial compound can be optimized by minimizing the value 

40 of F using the Fletcher- Powell method (Fletcher, R., Powell, M.J.D.. Computer J., 6, 1968, p.163), treating the torsion 
angles of the rotatable bonds present in the hydrogen-bonding part as variables. 

Then, the intramolecular energy is calculated for the hydrogen-bonding part of the trial compound optimized in step 
S20, and the conformations with intramolecular energies higher than given threshold are excluded (S21). For example, 
when the intramolecular energy of the hydrogen -bonding part of the trial compound is calculated by using the molecular 

45 force field of AMBER 4.0, conformations with an intramolecular energy of more than 100 kcal/mol are preferably 
excluded. Then, the atomic coordinates of the trial compound is converted to the coordinate system of the biopolymer 
so that the coordinates of the hydrogen-bonding heteroatoms in the trial compound in the conformation obtained in step 
S21 coincide with those of the corresponded dummy atoms (S22). The Kabsh's method of least squares can be used 
for this step (Kabsh, W„ Acta Cryst., A32, 1976, p.922; Kabsh W., Acta Cryst, A34, 1978, p.827). By these procedures, 

50 likely hydrogen-bonding schemes and the conformations of the hydrogen-bonding part of the trial compound can be 
estimated roughly at the same time. 

Following the above step, the intermolecular interaction energy between the biopolymer and the hydrogen-bonding 
part of the trial compound (the sum of a van der Waals interaction energy and an electrostatic interaction energy) and 
the intramolecular energy of the hydrogen -bonding part of the trial compound are calculated (S23). The intermolecular 

55 interaction energy between the biopolymer and the hydrogen-bonding part of the trial compound E inter can be calcu- 
lated by the following equation: 
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inter 

k 

s wherein "k" is a sequential number for the atoms in the trial compound; "GvdwfK))" and "G©| C (ko)" are the van der Waals 
and the electrostatic interaction energies of the grid point information at the grid point at kn closest to the atom "k"; "qk" 
is the atomic charge of the atom k. 

The intramolecular energy of the hydrogen-bonding part of the trial compound E intra can be calculated by known 
methods. For example, E intra can be calculated by the following equation using a reported force field such as 

10 AMBER4.0: 



E intra- £ ^y^^^^L^I ' B l R l )+£(Q,g/efl,) 

dihedrals n /'<;' i<j 

15 

wherein "V n " is a constant provided for force-field atom types of four atoms constituting a torsion angle; "n" is a constant 
representing the symmetry of the torsion-angle potential; 3> is a torsion angle, y is the phase of the torsion-angle poten- 
tial (provided for force-field atom types of four atoms constituting a torsion angle); "Ay" and "By" are constants for a pair 
of force-field atom types of the Mh and /-th atoms in the trial compound- "Ry" is the distance between the Mh and /-th 

20 atoms; % is the atomic charge on the Mh atom in the trial compound; qj is the atomic charge on the /-th atom in the trial 
compound, and e is a dielectric constant. 

Conformations of the trial compound are excluded if the sum of the intermolecular interaction energy and intramo- 
lecular energy as calculated in step S23 is higher than a given threshold (S24). For example, conformations in which 
the sum of these energies is higher than 10 kcal/mol per 100 dalton of molecular weight can be excluded. Then, the 

25 structure of the hydrogen-bonding part of the trial compound is optimized if it was not excluded in step S24 (S25). The 
structures of the hydrogen-bonding part of the trial compound can be optimized by optimizing the torsion angles of the 
hydrogen-bonding part of the trial compound as well as the relative location and orientation of the trial compound. 

For example, the structure of the hydrogen-bonding part of the trial compound can be optimized by the Powell 
method (Press, W.H., Flannery, B.P, Teukolsky, S.A., Vitterling, W.T., "Numerical Recipes in C", Cambridge University 

30 Press, Cambridge, 1989) or the like, so that the total energy E tota | calculated by the following formula is minimized: 

E total = E inter +E intra + W hb * N hb * C hb 

wherein E jnter and E intra are the same as those defined above; "W hb " is a weight; "N hb " is the number of hydrogen- 
35 bonds; and "C nb " is a constant energy value assigned for one hydrogen-bond (e.g., 2.5 kcal/mol). 

Then, conformations of the trial compounds are successively generated by rotating the rotatable bonds in the non- 
hydrogen-bonding part of the trial compound according to the rotation mode inputted in the "ADAM" format three- 
dimensional database prepared in S1 (S26), and steps S27-S30 are repeated for each generated conformation. Spe- 
cifically, the intermolecular interaction energy between the biopolymer and the trial compound and the intramolecular 
40 energy of the trial compound are calculated (S27), and conformations of the trial compound are excluded if the sum of 
the intermolecular interaction energy and intramolecular energy as calculated in S27 is higher than a given threshold 

(528) . The entire structure of the trial compound having conformations not excluded in step S28 are then optimized 

(529) , and the energies are calculated for docking structures of the biopolymer and the trial compound obtained by S29, 
and the most stable docking structure with the minimum energy is selected (S30). 

45 The intermolecular interaction energy and intramolecular energy can be calculated in the same manner as S23, 
provided that energies should be calculated for the whole structure of the trial compound in S27, whereas energies are 
calculated only for the hydrogen-bonding part of the trial compound in S23. As the upper limit for the sum of these ener- 
gies, for example 10 kcal/mol per 100 dalton of molecular weight can be used. Through this step, stable docking struc- 
tures comprising the biopolymer and the trial compound as well as active conformations of the trial compound can be 

so obtained. The optimization of the structures of the trial compound can be carried out in the same manner as S25, pro- 
vided that the optimization should be performed for the whole structure of the trial compound in S29, whereas the opti- 
mization is directed only to the hydrogen-bonding part of the trial compound in S25. 

The docking process of the biopolymer and the trial compound as described above can be carried out by using the 
automated docking program "ADAM" (PCT International Publication WO93/20525; Yamada, M. and Itai, A., Chem. 

55 Pharm. Bull., 41, 1993, p.1200; Yamada, M. and Itai, A., Chem. Pharm. Bull., 41, 1993, p.1203; Yamada Mizutani, M., 
Tomioka, N., and Itai, A..J. Mol. Biol.. 243, 1994, pp.310-326). 

Whether or not the trial compound should be selected as a ligand-candidate compound is decided based on all or 
part of the given criteria including the intermolecular interaction energy between the biopolymer and the trial compound, 
the intramolecular energy of the trial compound, number of hydrogen bonds, number of rings and others, which is eval- 
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uated in the most stable docking structure obtained in S30 (S31). Examples of the intermolecular interaction energy 
include electrostatic interaction energy, van der Waals interaction, hydrogen bond energy, and sum of them. For exam- 
ple, a trial compound with sum of the electrostatic and van der Waals interaction energies below -2 kcal/mol per molec- 
ular weight of 100 dalton can be selected as a ligand-candidate compound. Then, judgement is performed whether or 

5 not all of the trial compounds have been subjected to the selection (S32), and if not all of the trial compounds have been 
subjected to the selection, the steps of S1 0-S31 are repeated by returning to the step S1 0, and if the selection has been 
completed, the process proceeds to the step of S33. 

More promising ligand-candidate compounds are further selected from the ligand-candidate compounds selected 
in S31 (S33). Prior to the further selection, it is convenient to decide another criteria for the further selection by referring 

10 to a list of information about the ligand-candidate compounds in their most stable docking structures, e.g., the intermo- 
lecular interaction energy between the biopolymer and the trial compound, the intramolecular energy of the trial com- 
pound, the number of hydrogen bonds, and the number of rings and others. For example, a further selection is 
preferably carried out using more severe criteria comprised of the number of hydrogen bonds that are formed between 
the biopolymer and the ligand-candidate compound, the intermolecular interaction energy between the biopolymer and 

15 the ligand-candidate compound, and the number of atoms. 

Examples of the intermolecular interaction energy include electrostatic interaction energy, van der Waals interac- 
tion, and sum of them. For example, trial compounds with the sum of the electrostatic and van der Waals interaction 
energies below -8 kcal/mol per molecular weight of 100 dalton are preferably selected as ligand-candidate compounds. 
Although arbitrary number can be specified as a criterion of the number of hydrogen bonds depending on the sort of 

20 the biopolymer, it is desirable to apply a number of, for example, 2 or more, more preferably 3 or more as the criterion. 
Criterion about the number of atoms may also be arbitrary, but the number of, for example, 20 or more, preferably in the 
range of 20 to 40 can be specified as the criterion. 

If too many dummy atoms are generated from the biopolymer, or if the trial compound has rather high conforma- 
tional flexibility or has a lot of heteroatoms, according to a more preferred embodiment of the present invention, infor- 
ms mation about the trial compound is inputted according to the step S9, and then steps S13-S25 are carried out for a 
partial structure of the trial compound and preserve information about correspondences of dummy atoms and hydro- 
gen-bonding heteroatoms of the trial compound that provide impossible hydrogen-bonding schemes and hydrogen- 
bonding heteroatoms that cannot be corresponded with any dummy atom in the partial structure. Although the partial 
structure of the trial compound can be arbitrarily specified without any structural restrictions, a partial structure contain- 

30 ing at least three hydrogen-bonding functional groups is preferred. Presence or absence of conformational flexibility in 
the partial structure may be disregarded. 

Subsequent to the above step, the steps of S13-S33 are performed for the whole structure of the trial compound, 
excluding, from the set of correspondences prepared in the step S12, the correspondences containing the hydrogen- 
bonding heteroatoms and the correspondences of dummy atoms and hydrogen-bonding heteroatoms preserved in the 

35 previous step. As a result, the number of the correspondences and conformations to be examined can be reduced, 
thereby time for structural searching for stable docking structure of the biopolymer and the trial compound can be 
greatly shortened (this method is sometimes referred to as the "Pre-Pruning method (PP method)"). The PP method is 
based on the assumption that any hydrogen-bonding schemes and ligand conformations that are impossible in the par- 
tial structure of the trial compound are also impossible in the whole structure of the trial compound. By applying the PP 

40 method, the time for the search can be remarkably shortened without affecting the accuracy and reliability of the 
resulted docking structure of the biopolymer and the trial compound. 

Example 

45 In order to verify the effectiveness of the method of the present invention, calculations were carried out using a dihy- 
drofolate reductase as the target biopolymer as to whether or not the substrates or known inhibitors, or analogues 
thereof can be selected as ligand-candidate compounds. 

As the atomic coordinates of the dihydrofolate reductase (hereinafter referred to as "DHFR"), the values obtained 
from the atomic coordinates of the binary complex of E. coli DHFR and methotrexate available from the Protein Data 

so Back entry 4DFR were used. The atomic coordinates of a cofactor NADP + molecule were added to the atomic coordi- 
nates of the binary complex on the basis of the crystal structure of the ternary complex of DHFR, folic acid, and NADP\ 
All water molecules were eliminated from the atomic coordinates of the binary complex, except for the two water mole- 
cules that strongly bind to DHFR and cannot be chemically removed. One of the two water molecules not eliminated 
binds to tryptophan 21 and aspartic acid 26 of DHFR through hydrogen-bonds, and the other binds to leucine 114 and 

55 threonine 116. 

As a database of trial compounds, the Available Chemicals Directory (ACD-3D, MDL) containing about 110,000 
molecule data was used. Among the compounds stored in the database, the following molecules were excluded: mol- 
ecules with 5 or less atoms; molecules consist of only H, C, Si and halogen atoms; molecules containing atoms other 
than H, C, N, O, S, P and halogen atoms; molecules with 6 or more rotatable bonds; and molecules having no ring struc- 
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ture, and 10,017 compounds were subjected to the search. The data were supplemented with hydrogen atoms, and 
force-field atom-type numbers, hydrogen-bonding category numbers, atomic charge, and bond-rotation modes (rotation 
with a rotation angle of 120°) were assigned. The criteria for the selection of ligand -candidate compounds were as fol- 
lows: the intermolecular interaction energy between the target biopolymer (the enzyme) of -7 kcal/mol or less per 

5 molecular weight of 100 dalton, 3 or more intermolecular hydrogen bonds, and 25 or more number of atoms. 

According to the method of the present invention, about 1400 trial compounds were finally selected as ligand-can- 
didate compounds, and the process required about 80 hours from inputting of three-dimensional coordinates of the 
biopolymer until the completion of the selection of the ligand-candidate compounds. These ligand-candidate com- 
pounds included the compounds shown in Figure 3, together with other numerous analogues to substrates and known 

w inhibitors. An inhibitor, methotrexate, and the substrate, dihydrofolate, were not selected from this search because of 
the limitation about the number of rotatabie bonds, while trimethoprim was selected as a ligand candidate compound. 
The binding modes and the conformations of the ligand candidate compounds selected by the method of the present 
invention agreed well with the binding modes and the conformations deduced from the crystal structures of the com- 
plexes comprising the substrate or the inhibitor analogue together with the enzyme. 

is According to the method of the present invention, numerous ligand-candidate compounds with novel structures 
quite dissimilar to the substrate and the known inhibitors were selected. A part of examples thereof are shown in Figure 
4 together with the estimated docking structures. In the figure, the bold line indicates the molecular skeletons of the 
selected ligand-candidates , the dotted line indicates hydrogen bonds, the fine solid line indicates a part of the biopol- 
ymer, and the cage depicted with the broken line indicates regions allowed for the centers of the atoms of the ligand. 

20 From these docking structures, it was found that that several hydrogen bonds were formed and high complementarities 
with the enzyme cavity were achieved by selected ligand-candidates. Therefore, these ligand candidate compounds are 
promising lead compounds as inhibitors against dihydrofolate reductase. 

Industrial Applicability 

25 

According to the method of the present invention, ligand compounds to the target biopolymer whose structure is 
already known, can be identified from a three-dimensional structure database in a short time, while considering confor- 
mational flexibility, thereby extremely reliable searching results can be easily obtained. The method of the present 
invention is thus useful in the field of medical and biological sciences for the creation of lead compounds having activi- 
30 ties as inhibitors, agonists, or antagonist to biopolymers. 

Claims 

1. A method of searching one or more ligand compound to a target biopolymer from a three-dimensional structure 
35 database, which comprises the steps of: 

(i) the first step of assigning hydrogen-bonding category numbers, information for calculating force-field energy, 
and information for generating conformations, to two or more trial compounds, in addition to atomic three- 
dimensional coordinates thereof; 
40 (jj) the second step of preparing physicochemical information about a ligand-binding region and one or more 

dummy atoms based on the three-dimensional atomic coordinates of the target biopolymer; 

(iii) the third step of estimating the most stable docking structure through structural optimization, wherein said 
step further comprises the steps of generating possible docking structures by docking a trial compound to the 
biopolymer while varying conformations of the trial compound, and evaluating interaction energy between the 

45 target biopolymer and the trial compound, based on the hydrogen-bonding category number, the information 

for calculating force-field energy, and the information for generating conformations assigned to the trial com- 
pound in addition to the three-dimensional coordinates according to the step (i) and based on the physico- 
chemical information about the ligand-binding region prepared in the step (ii): 

(iv) the fourth step of deciding whether or not the trial compound should be adopted as a ligand candidate com- 
50 pound based on given criteria including the interaction energy values between the target biopolymer and the 

trial compound in the most stable docking structure found according to the step (iii); and 

(v) the fifth step of repeating the step (iii) and the step (iv) for all of the trial compounds. 

2. The method according to claim 1 , wherein the third step further comprises the steps of: 

55 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all 
of the combinatorial correspondences between hydrogen -bonding heteroatoms in the trial compound and 
dummy atoms placed at the positions of heteroatoms that can be hydrogen-bonding partners of hydrogen- 
bonding functional groups in the biopolymer; 
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(2) step (b) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the 
trial compound and conformations of hydrogen-bonding part of the trial compound by comparing the distances 
between dummy atoms with the distances between the corresponded hydrogen-bonding heteroatoms while 
systematically changing the conformations of the trial compound; and 

(3) step (c) of obtaining the structure of a docking structure comprising the biopolymer and the trial compound 
by converting the atomic coordinates of the trial compound to the coordinate system of the biopolymer for each 
hydrogen-bonding scheme and conformation obtained in the step (b) based on the correspondences between 
the hydrogen-bonding heteroatoms in the trial compound and the dummy atoms. 

The method according to claim 1, wherein the third step further comprises the steps of: 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all 
of the combinatorial correspondences between hydrogen-bonding heteroatoms in a partial structure of the trial 
compound and the dummy atoms placed at the positions of heteroatoms that can be hydrogen-bonding part- 
ners of hydrogen-bonding functional groups in the biopolymer; 

(2) step (b) of simultaneously estimating hydrogen-bonding schemes between the biopolymer and the trial 
compound and conformations of the partial structure of the trial compound by comparing the distances 
between dummy atoms with the distances between the corresponded hydrogen-bonding heteroatoms while 
systematically changing the conformations of the trial compound; 

(3) step (c) of preserving correspondences of the dummy atoms and the hydrogen-bonding heteroatoms that 
provide impossible hydrogen-bonding schemes in the partial structure of the trial compound, and hydrogen- 
bonding heteroatoms that cannot form any hydrogen-bond with the dummy atoms based on the hydrogen- 
bonding schemes and conformations obtained in the step (b); 

(4) step (d) of searching likely binding modes between the biopolymer and the trial compound by preparing all 
of the combinatorial correspondences between dummy atoms and hydrogen-bonding heteroatoms in a the 
whole structure of the trial compound, excluding the correspondences containing the hydrogen-bonding heter- 
oatoms preserved in the step (c) and the correspondences containing the dummy atoms and hydrogen-bond- 
ing heteroatoms preserved in the step (c); 

(5) step (e) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the 
trial compound and conformations of hydrogen-bonding part of the trial compound by comparing the distances 
between dummy atoms with the distances between corresponded hydrogen -bonding heteroatoms while sys- 
tematically changing the conformations of the trial compound; and 

(6) step (f) of obtaining the structure of a docking structure comprising the biopolymer and the trial compound 
by converting the atomic coordinates of the trial compound to the coordinate system of the biopolymer for each 
hydrogen-bonding scheme and conformations obtained in the step (e) based on the correspondences between 
the hydrogen-bonding heteroatoms in the trial compound and the dummy atoms. 

The method according to claim 1 , wherein the third step further comprises the steps of: 

(1) step (a) of searching likely binding modes between the biopolymer and the trial compound by preparing all 
of the combinatorial correspondences between hydrogen-bonding heteroatoms in the trial compound and 
dummy atoms placed at the positions of heteroatoms that can be hydrogen-bonding partners of hydrogen- 
bonding functional groups in the biopolymer; 

(2) step (b) of simultaneously estimating possible hydrogen-bonding schemes between the biopolymer and the 
trial compound and conformations of hydrogen -bonding part of the trial compound by comparing the distances 
between dummy atoms with the distances between hydrogen-bonding heteroatoms while systematically 
changing the conformations of the trial compound; 

(3) step (c) of optimizing the conformations of the trial compound so that the positions of the dummy atoms and 
the hydrogen-bonding heteroatoms in the trial compound can be in accord with each other while retaining the 
hydrogen-bonding scheme obtained in the step (b), and then excluding the conformations of the trial com- 
pound having high intramolecular energies; 

(4) step (d) of obtaining the structure of a complex comprising the biopolymer and the trial compound by con- 
verting the entire atomic coordinates of the trial compound to the coordinate system of the biopolymer for each 
conformation not excluded in the step (c) based on the corresponding relationships between the hydrogen- 
bonding heteroatoms in the trial compound and the dummy atoms; 

(5) step (e) of excluding from docking structures of hydrogen-bonding parts obtained in the step (d) the docking 
structures in which intramolecular energies of the hydrogen-bonding parts of the trial compound and intermo- 
lecular energies between the biopolymer and the hydrogen-bonding parts of the trial compound are high, and 
then carrying out structure optimizations of the remaining docking structures; 
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(6) step (f) of obtaining new docking structures comprising the whole trial compound by generating the confor- 
mations of non-hydrogen-bonding parts of the trial compound for each docking structure obtained in the step 
(e); and 

(7) step (g) of excluding from the docking structures obtained in the step (f) the docking structures in which 
intramolecular energies of the whole structure of the trial compound and intermolecular energies between the 
biopolymer and the trial compound are high, and then carrying out structure optimizations of the remaining 
docking structures. 

A ligand compound for a biopolymer which is retrieved by a method according to any one of claims 1 to 4. 

A database containing information of three-dimensional coordinates of trial compounds, which comprises informa- 
tion selected from the group consisting of atomic type numbers, hydrogen bonding category numbers, atomic 
charge, and bond rotation schemes, wherein the information being appended to the three-dimensional coordinates 
of the trial compounds, and which is optionally appended with information of three-dimensional coordinates of 
hydrogen atoms. 

A database according to claim 6 which is used for retrieving ligand compounds for biopolymers. 
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Fig. 3 
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Fig. 4 
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